Blood rna biomarkers of coronary artery disease

ABSTRACT

The present invention discloses a unique panel of gene transcripts down-regulated in patients with coronary artery disease (CAD). Methods and compositions for detecting CAD in patient blood samples are provided.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/382,668 filed Sep. 1, 2016, the entire disclosure of which is incorporated herein by reference.

TECHNICAL FIELD

The invention relates to diagnosing coronary artery disease (CAD) in a patient, the method comprising analyzing blood samples drawn from the patient and determining the levels of a panel of bio-markers indicative of CAD.

BACKGROUND OF THE INVENTION

There are more than a million heart attacks each year, and 2,200 Americans die of heart disease each day, about 1 every 39 seconds. Outward symptoms of coronary artery disease (CAD) are chest pain, typically radiating down the left arm, and shortness of breath (dyspnea) upon exertion. However, chest pain and dyspnea alone are not particularly specific warning signs. In a prospective analysis of patients presenting with chest pain, ultimately, most cases were determined to be musculoskeletal (20%), gastrointestinal reflux disease (GERD) (13%), while CAD was diagnosed in 11% of cases, and the remaining cases were either pulmonary, neurological, or idiopathic. The Framingham risk factors of advancing age, male gender, elevated cholesterol, smoking, and hypertension, are very good predictors of long term risk (30 yr. risk, C statistic=0.803), but they are far less accurate in acute clinical settings at determining whether a person has CAD or not (C statistic=0.667, where 0.5 is random chance). Thus, there is a tremendous need for improvement in the diagnosis of CAD. From the >1 million cardiac catheterizations yearly, 622,000 result in interventions such as stent placement. Despite overt symptoms and other clinical tests, such as stress electrocardiograms, suggestive of CAD, 20-40% of angiograms do not detect any occluded arteries. The American College of Cardiology's Registry, covering 398,978 patients, identified 39.2% of angiograms with less than 20% stenosis. Thus, a blood test for CAD would have the potential to avoid as many as 400,000 needless catheterizations per year. Previously, we've shown that expression profiling of blood can be used to diagnose which patients will reocclude on bare metal stents, and thus are likely to benefit from a drug eluting stent. Thus, it is both feasible and important to extend RNA profiling to the diagnosis of coronary artery disease itself.

Several prior studies using microarrays suggested that there was an RNA signature in blood associated with CAD. However, the agreement between these studies on exactly which transcripts are modulated is quite low. Such discrepancies could have several explanations, but likely arises from the noise created by highly abundant signals, such as globins, which can overwhelm true signals in microarrays, and thus potentially mask true changes of low magnitude. Thus, we have employed a more advanced, single-molecule RNA sequencing (RNAseq) methodology to help identify diagnostic transcripts. Using RNAseq of whole blood RNA, we identify a subset of transcripts that are consistent with the potential role of T regulatory (Treg) cell dysfunction as a correlate of CAD.

BRIEF DESCRIPTION OF THE INVENTION

The invention takes advantage of the fact that up to 40% of patients that undergo coronary catheterization actually do not have meaningful coronary blockage. By comparing the mRNA expression pattern of patients with vs. those without CAD, transcripts associated with CAD (TRACs) were identified. The strength of this model is that blood can be taken prior to the catheterization, and the outcome of the angiography is known within hours, which provides an ideal learning environment for designing a transcriptome-based test. After the coronary angiograms were digitally interpreted by an attending physician, the patients were divided into 3 groups, ≦20% stenosis (low CAD), >20% but <70% stenosis of any vessel (mid CAD), and ≧70% stenosis of any artery (CAD).

Clinical parameters: Patients presenting for non-emergent, non-ST elevation myocardial infarction (non-STEMI) complaints of chest pain, exertional dyspneas or other symptoms suggestive of CAD were consented for participation in this study under an IRB approved protocol. Essentially all subjects consented to opt-in for biobanking of their blood, DNA/RNA, and DNA/RNA sequence data. Patients admitted for cardiac catheterization had 3 Tempus blood RNA tubes (3 ml each) (ThermoFisher Scientific, Grand Island, N.Y.) collected by venipuncture prior to sedation. Additional tubes were collected for plasma (EDTA) and buffy coat (CPT). After venipuncture, these studies were purely observational and did not alter the clinical treatment in any way. All relevant clinical data, including a complete blood count (CBC), was captured for comparison to the genomic studies.

Prior to cardiac catheterization, cardiac medical histories were performed by medical professionals to determine CAD risk factors, as defined by accepted guidelines. Hypertension was defined as a history of blood pressure ≧140/90 mmHg and/or treatment with anti-hypertensive medications. Diabetes mellitus was defined by fasting glucose of ≧126 mg/dl and/or use of insulin or oral hypoglycemic agents. Dyslipidemia was defined according to National Cholesterol Education Program Adult Treatment Panel III guidelines or by treatment with lipid lowering medication. Current smoking status was defined by active smoking within 3 months of presentation. A family history of CAD was defined as MI or cardiac death in a first-degree relative.

Chest pain was classified according to standard criteria for angina pectoris as described by Min, et al. [(Min et al., Am J Med. 128(8):871 (2015)]. Typical angina includes substernal, jaw, and/or arm pain upon exertion, and which resolves within 15 minutes of rest and/or use of nitroglycerin. Atypical angina involves 2 of these characteristics. Patients with non-anginal chest pain experience 1 or less of these symptoms. Dyspneic patients without chest pain are classified as having typical angina.

From these clinical parameters, the risks were scored according to the method of Min et al., whereby the points accumulated from age, gender, hypertension, diabetes, symptom type (typical/atypical chest pain), family history, and smoking status are compared to an ordinal risk model to predict likelihood of CAD.

Transcriptome profiling: RNA was purified from frozen Tempus-preserved blood samples using Tempus blood isolation columns. After aggressive in-solution DNAse treatment (Turbo™ DNAse, ThermoFisher Scientific), typical RNA yield from 2.5 ml blood averaged ˜5 μg, with a BioAnalyzer integrity score >8 (10 is maximal). The DNAse-treated total RNA was depleted of ribosomal RNA (rRNA) by RiboZero (Ambion, ThermoFisher Scientific) leaving ˜500 ng of rRNA-depleted RNA. RNA sequencing: For RNAseq, 100 ng of rRNA-depleted RNA was fragmented and analyzed on a Heliscope sequencer (SeqLL, Inc., Woburn, Mass.). The raw reads, typically 40 million at 38 by average length, were then computationally aligned to the transcriptome or genome using Helisphere software. The number of reads that align to each transcript was counted, yielding Digital Gene Expression (DGE) on ˜77K known transcript isoforms (HG19 build). The raw read count is then adjusted by the size of the transcript, so that long transcripts do not appear more highly expressed than short transcripts, and by the number of total reads per sample, to produce “Reads Per Kilobase of transcript, per Million mapped reads” (RPKM). Thus, RPKM corrects the expression level between samples that have different absolute numbers of reads. Although the present examples were based on SeqLL/Helicos single molecule sequencing for RNA sequencing and ddPCR (BioRad, Hercules, Calif.) for quantitation, other methods for determining absolute and relative levels of RNA transcripts including without limitation, direct RNA sequencing methods, amplification-based methods, and hybridization methods are well known to those of ordinary skill in the art. RPKM levels were migrated to GeneSpring GX13 (Agilent, Santa Clara, Calif.), without additional normalization, to identify transcripts that differ between groups (TRACs).

Comparison of blood RNA preservation/isolation methods: To determine whether TR ACs were affected by the type of blood RNA preservation method employed, 3 Tempus and 3 Paxgene tubes (PreAnalytix GmbH, Hombrechtikon, Switzerland) were drawn from subjects at the same time and then RNA was purified according to the manufacturer's protocol. The Paxgene tubes were subjected to an on-column DNAse treatment, as specified by the manufacturer, while Tempus RNA was treated with TurboDNAse prior to analysis. Equal amounts of RNA, quantified by Nanodrop 260/280, was reverse transcribed with random hexamers using the iScript RT kit (RNAseH+) (BioRad), and then analyzed with a set of ‘invariant’ PCR targets (18S ribosomal, beta-actin, GAPDH), targets related principally to lymphocytes (SpiB, CD81, FoxP3), or selected targets related principally to neutrophils (ALPL, DEFA1, IL8RB, IL8, NFkB, cMyc). cDNAs from pooled standard dilutions were included and analyzed by droplet digital PCR (ddPCR) on a BioRad QX200 real-time system using the EvaGreen fluorochrome in duplicate. The abundance of each transcript was expressed relative to one or more ‘invariant’ transcripts for that sample, as specified.

The present biomarkers have several useful applications in current medical practice. Some embodiments include, but are not limited to:

In one embodiment, the predictive panel and methods disclosed herein can replace or reduce the need for coronary angiography. In a preferred embodiment, the RNA biomarkers, or their protein products would be assessed to determine the likelihood that a patient with one or more symptoms of CAD may need interventional testing by coronary angiography. Because angiography requires light sedation, introduction of catheter to the coronary beds, and the injection of contrast media, there are finite and known risks of complications, and considerable expenses that could be avoided if the patient were at low risk of CAD as determined by the present invention.

In one embodiment, the predictive panel and methods disclosed herein can indicate the need for coronary imaging tests, such as magnetic resonance (MR) angiography, or computed tomography (CT) angiography. In situations where coronary imaging instruments are available, the TRACs described here may be used to predict whether such relatively noninvasive tests as MR angiography, or CT angiography would be justified. While less invasive than angiography, CT involves some radiation exposure to the patient. Both MR and CT angiography can involve the use of tracer compounds that could have adverse consequences. Furthermore, due to the high cost of the instruments, MR and CT are not available in many non-urban areas of the U.S., and worldwide.

In another embodiment, the predictive panels and methods disclosed herein can indicate the need for medical intervention. In a preferred embodiment, the TRAC biomarkers are applied to subjects under routine medical care and considered at elevated risk of CAD, and thus prone to untoward events, such as heart attacks, strokes, and aneurysms. In this embodiment, subjects with one or more known risk factors for CAD, such as hypertension, family history, advanced age, and/or elevated cholesterol, would have blood drawn for a TRAC test and the results used to determine the need for medical intervention. Common drug interventions for early CAD include the use of statins to lower cholesterol, low dose aspirin to reduce vascular inflammation and platelet reactivity, and antihypertensive drugs such as ACE inhibitors, diuretics, or beta-blockers. In some cases, tighter control of blood sugar levels, by diet or drugs, can be important. Common lifestyle changes include weight loss, smoking cessation, dietary changes, reduced stress, and increased exercise.

In still another embodiment, the panels of TRAC biomarkers are used in routine physical examination of patients presenting to their primary care physicians. In such an embodiment, the panel of TRACs could be used to identify patients with early CAD, but that are not aware of their disease progression. In fact, almost half of all deaths from CAD-related events are in persons that were apparently unaware that they had CAD. Once identified, the patient's risks could be mitigated by drug and lifestyle interventions as described above.

In one embodiment the TRAC biomarkers comprising a plurality of the transcripts selected from those listed in Tables 1-8 indicate that the subject has an increased likelihood of having or developing coronary artery disease.

In a preferred embodiment, the TRAC biomarkers are selected from the group comprising AML1, CD3E, CD4, CD25, CTLA4, DGKA, DLG1, ETS1, FOXP3, ICOSLG, IL2RB, IL2RG, IL2RA, IKZF4/Eos, RUNX1, SMYD3, TCF3, TRIM28, and ZAP70.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a plot of RNA yield per patient (Y axis) in relation to the lymphocyte count in whole blood (X axis).

FIG. 2 is a plot of RNAseq linearity with differentially expressed transcripts indicated (dark gray points) relative to all transcripts (light gray points).

FIG. 3 is a plot of select Treg marker levels (Y axis) in relation to the level of CAD in each of 3 groups (X axis).

FIG. 4 is a comparison of clinical versus RNA predictors of CAD. Panel A (upper left) shows conventional clinical predictors of CAD plotted for each group, showing Age (decades/10), Gender (% Male), Symptom type (typical/atypical), Diabetes (%), Hypertension (HTN, %), Family History of CAD (%), and current Smoking (%). A cumulative CAD risk score is calculated for each patient based on the method of Min et al., and divided by 10 for graphic purposes. The relationship between the cumulative risk score and CAD was calculated by the Receiver Operator Characteristic (ROC) and a confusion matrix to identify the accuracy of the method (lower left). Panel B (upper right) shows the performance of 7 RNA transcripts as their gene symbols (i.e. DGKA, DLG1, etc.) expressed as the RPKM by CAD group. A cumulative score was computed expressing each transcript as a ratio to the mean of its expression in the entire group, to prevent highly expressed transcripts from being over-represented. For graphic purposes, the TRIM28 and Cumulative scores are/10. In the lower right panel, the relationship between the cumulative TRAC score (constant-TRAC, to create positive ROC) and angiographically-confirmed CAD is analyzed by ROC similar to the clinical model for the 48 patients in each group.

FIG. 5 illustrates the effect of RNA collection/stabilizer solution on expression of TRAC and unrelated transcripts. Whole blood RNA was prepared by two different methods of collection and stabilization from the same donors (n=3). RNA was prepared by PaxGene, which is principally based on a cationic detergent CPC, or Tempus, which is based on the strong denaturant guanidine. Identical quantities of DNAse-treated RNA were reverse transcribed using SuperScript III and then quantitatively amplified using droplet-digital PCR (ddPCR, BioRad). Levels of the transcripts (Y axis) shown are the ratio of transcript abundance in Tempus vs Paxgene (X axis), based on absolute quantities calculated from a Poisson distribution of >20K droplets (+sem).

DETAILED DESCRIPTION OF THE INVENTION EXAMPLES Example 1 Description of biomarkers from midpoint analysis and transcriptome alignment.

Clinical parameters: A total of 113 patients were entered into the study, with 1 patient excluded post-consent due to undetected exclusion criteria of heart failure. Thus, 112 patients were available for analysis, and RNAseq was successfully completed on 96 patients. The following example analyzed the RNAseq result of the first 48 patients, as well as the overall clinical parameters of the entire cohort, in order to determine whether the time and expense of RNAseq was justified on the remaining patients.

Clinical prediction model: The accumulated clinical risk factors yielded a range of 5 to 12 points per patient. A receiver-operator curve (ROC) was computed using JROCFIT between the clinical risk prediction compared to a true positive for CAD based on 70% stenosis on angiogram. The fitted ROC curve yielded a C=0.663 with an accuracy of 60.7%, sensitivity of 57.7%, and specificity of 61.6%, for all 112 patients, and similar values of X, Y, Z for the 96 RNAseq+pts. These values are slightly lower than the published values for this model (C=0.71-0.77), possibly due to differences in the threshold for stenosis (50% there vs 70% here), and comparable to a Diamond-Forrester prediction model (C=0.64).

Analytical parameters: The yield of RNA and the number of reads per patient, and per group did not vary systematically. With 16 additional patients failing at various technical quality control steps in RNA sequencing, RNAseq was completed on 96 patients. Most failures occurred at 3 key steps: low yield from RNA purification or ribosomal depletion, inefficient cDNA synthesis, or low yield of reads from RNAseq. Because of the high loss during ribo-depletion (>90%), there was not always adequate RNA to repeat the RNAseq process.

Sources of variation in RNA yield. Patient blood samples collected with either Paxgene or Tempus RNA preservation tubes show a surprisingly large variation in the RNA yield, with Tempus generally producing higher yield. In the present Tempus-preserved samples, total nucleic acid yield (prior to DNAse treatment) ranged from (0.6-35 μg/3 ml whole blood, mean=10.6 μg). The correlations between RNA yield and any single blood cell count were quite weak, with the strongest correlation to absolute lymphocyte count (r=0.55 with N=112 (R²=0.31) (FIG. 1). While one outlier seems to drive this correlation, it actually has a small impact on the correlation (r=0.45 w/o). This demonstrates that the white cell counts account for only about 30% of the variability in RNA yield, and suggest there may be important differences in the recovery of RNA between patients.

Initial identification of TRACs at midpoint As shown in FIG. 2, there is excellent linear quantitative ability of RNA levels over 2.5 log₂ orders of magnitude in RPKM without any artificial normalization of the data. To identify TRACs, the patients were divided into LOW (<20% stenosis), MID (20-69%), and HIGH CAD (>70%) groups. To identify differentially expressed transcripts, we compared the LOW vs. the HIGH CAD group, and required a >1.4-fold change and t test p<0.05 uncorrected, resulting in 238 transcripts, highlighted in dark gray in FIG. 2. This combined fold-change/t test strategy has been established in large, multicenter control studies using spiked samples as a reliable approach to identify true differences. The 238 transcript list is presented in Table 1, wherein the relevant transcripts are identified with several different descriptors that are common in the art, especially the UCSC ID (ie uc002rtm.2), maintained by the University of California at Santa Cruz, and the RefSeq ID (ie NM_001113494), maintained by the US National Center for Biotechnology Information (NCBI), as well as the consensus Gene Symbol, and Gene Description.

Classification algorithm: Using ANOVA, we re-analyzed all 3 groups and identified 84 differentially expressed transcripts (Table 2). Using Genespring GX13 software, a Support Vector Machine (SVM) algorithm was built which was 100% sensitive and specific. Leave One Out Cross Validation (LOOCV) applied to the same samples found the model was 87.5% accurate at diagnosing CAD (33.3% is random). For comparison purposes, using the CardioDx transcripts (Elashoff, et al., 2011) to build a comparable SVM yielded only 36.7% accuracy (CAD 37.5%, LOW 45.1%, MID 10%, w/33%=random). This SVM of the CardioDx transcripts, however, does not include the algorithm they employed. For reference, classifying the 48 patients by gender, identified 34 transcripts, mainly from X and Y chromosomes, and yielded an SVM prediction model that was 100% accurate.

Example 2 Description of RNA bio markers from transcriptome alignment of RNAseq data from 96 patients.

The entire 112 patient cohort was subjected to RNA sequencing using the Helicos (now SeqLL) platform. After RNA purification, ribosomal RNA depletion, and RNA sequencing, it was determined that 96 patient samples were suitable for further analysis.

Discovery and validation sets. Those 96 samples were randomized into a ‘Discovery Set’ of 45 patients and a ‘Validation set’ of 51 patients.

Discovery Set. The 45 patients were subdivided into groups according to the degree of CAD as determined upon the same day as blood was drawn for RNAseq. The groups are described above and are composed of LOW (0-20% stenosis), MID (21-69%) and HIGH (≧70%). To identify a relatively small set of biomarkers, with relatively high confidence, the LOW group was compared to the MID+HIGH groups via a combined T test (p<0.001 uncorrected, and a minimum fold change of 1.5) resulting in 59 transcripts (see Table 3).

A smaller list was constructed of just the top 5 changed transcripts based on both fold-change and a scientific analysis of the transcripts. A PLSD model built on. only the 5 transcripts in the DISC group was 84% accurate (91% for LOW, 77% for MID+) (see Table 4).

Because the LOW group includes true ‘normals’ with essentially 0% stenosis and some patients with small amounts of CAD, we conducted another analysis on the Discovery group that re-arranged the patients by ‘normal’ (0%) vs ‘abnormal’ (>0%) and conducted a similar T test yielding 500 transcripts (see Table 5).

Overall prediction models built with the 96 patients. To gain the most powerful model, we included all samples and divided them as in the Discovery/Validation design into LOW vs MID/HIGH (MID+). This yielded 48 patients in each group. Using this design, transcripts were first filtered to exclude transcripts with RPKM<0.01 in 70% of samples, and then TRACs were identified by one-way ANOV of the LOW vs MID+ patients at an uncorrected p value of 0.001, yielding 198 transcripts (see Table 6). These 198 transcripts contain some duplicates that cover essentially the same RNA transcript, yielding 169 non-redundant, unique RNA transcripts.

A PLSD model build on these 198 transcripts was very accurate at a discrimination showing an overall accuracy of 98.9% (100% for LOW, 97.9% for HIGH). This remained fairly robust even with N-fold internal validation yielding overall accuracy of 80% (77% for LOW, 83% for MID+).

Filtering for higher level expression. By filtering the 198 transcripts for those which had a >20^(th) percentile level in both groups, the list was narrowed to 96 transcripts with relatively higher absolute expression than the 198 transcripts. This however, did not improve the predictive ability of the PLSD model built on it, with overall accuracy of 93% (92% for LOW, 94% for MID+), but still produced a quite powerful test, with fewer transcripts (see Table 7).

Identifying a small set of predictive transcripts. To narrow the list even further, the 198 transcripts predictive in the overall cohort were compared for overlap with the transcripts predictive in the DISC set using the LvM+ criteria (Table 3, 59 transcripts). Common to both lists were 14 transcripts that were used to build a new model for the overall cohort. A PLSD model remained powerful with an overall accuracy of 86% (87% for LOW, 85% for MID+) which dropped to 80% overall accuracy in a LOOCV (see Table 8).

Identifying the most comprehensive list of predictive transcripts. For both diagnostic and potential therapeutic uses, there is value in knowing the most comprehensive list of transcripts that might be employed. Thus, by relaxing the ANOVA stringency to p<0.005 (vs. 0.001 in the prior examples), a larger set of 1132 transcripts is identified (see Table 9).

Interpreting the TRAC signature: Both the midpoint and full analysis showed that almost all of the differentially expressed transcripts were down-regulated in the CAD patients, a pattern that rarely occurs in RNA expression analysis, where there is typically a balance between increased and decreased transcripts. Furthermore, the changes are essentially all of the same magnitude (mean=˜1.7 fold), while we would normally expect a range of magnitudes. After extensive analysis of the TRACs, the most likely explanation is that TRACs are markers of a particular cell type, and that cell type is reduced in abundance in persons with developing or existing CAD. An extensive literature documents that two relatively selective cell type can be altered in abundance in patients with CAD.

TRACs as markers of a modified population of circulating cells. One way to obtain the uniform decrease in the transcripts is if these transcripts are preferentially associated with a subset of cells, and that subset is reduced in the CAD patients. Certain transcripts identified in the midpoint analysis, such as CCL28, CD2BP2, CD3E, CD5, CD81, KDM2B, LEF1, and MIB2 have been previously associated with various stem and progenitor cell populations. CD81 is interesting because of its well-characterized role in induced pluripotent stem cells (iPSCs), mesenchyrnal stem cells (MSCs), and hematopoetic stern cells (HSCs).

There is a substantial literature that consistently reports reductions in circulating progenitor cell (CPC) populations in patients with stable CAD or preclinical atherosclerosis. Hypercholesterolemia, smoking and obesity are associated with reduced levels of circulating progenitors. Conversely, circulating endothelial progenitor cells (EPC) are increased in acute MI cases. However, it is unlikely that a decrease in EPC numbers, which are rare (<1%), could cause the substantial shift in RNA levels detected in whole blood. Rather, both EPC decreases and the RNA profile might be two aspects of some larger change in the cellular composition of blood in atherosclerosis.

TRACs as putative markers of Treg: A second potential explanation for TRACs is that there are known reductions in the T regulatory (Treg) subset of lymphocytes in atherosclerosis. An extensive literature documents reduced Treg abundance, and a relative imbalance in Treg vs T effector (Teff) cells in patients with CAD. To test for the potential changes in Treg, we examined the regulation of known Treg markers in the 3 CAD groups. As shown in FIG. 3, FoxP3, and other markers such as CD4, CD25, ETS1, and RUNX1 mRNA expression levels show a stepwise decrease in expression from LOW, MID, to CAD. By comparison, the expression of an irrelevant marker such as the prostaglandin E receptor 3 (PTGER3) does not show this CAD-related trend.

Treg imbalance is sufficient to create the observed RNA expression pattern. To determine whether a reduction in Treg cell counts in blood would be sufficient in magnitude to produce the observed changes in RNA levels, 8 publications that reported Treg % in normal and CAD patients, such as unstable angina or acute coronary syndrome (ACS), were reviewed and the change in Treg % computed. The average reduction in Tregs, typically defined as CD4+CD25+CD127low by flow cytometry, was 30.25%, which would translate to a 1.43-fold difference in Treg RNA levels, assuming that these markers are essentially unique to Tregs. Thus, the 1.47-fold reduction in mRNA for the consensus FoxP3 marker is quite consistent with the reported reduction in Treg cell numbers in CAD.

Example 3 Biomarkers identified by their relationship to the Treg-like pathway.

There are not pre-specified pathways or gene ontologies specific to the Treg maturation and differentiation process. Recent review articles by Li and Rudensky, and others, were adapted to create a functional pathway of Treg signaling to compare with the pattern of RNA changes observed. A striking overlap was observed in the DGEs and known Treg signaling.

Biomarkers of CAD can also be defined by their biological relationship to the Treg pathways and thus we identified a series of markers with known relationship to Tregs differentiation and which varied systemically with CAD status. These include FOXP3, CTLA4, ZAP70 and others (see Table 10).

Using a gene list of just 3 transcripts (FoxP3, Runx1, IL2RA/CD25) associated with relative specificity for Treg cells, but not covered in other TRAC lists, a prediction model was built on the 96 patient cohort whereby 48 subjects are classified as LOW and 48 as MID or HIGH (MID+). Using only these 3 transcripts, a PLSD model showed 60% overall accuracy (67% for LOW, 54% for MID+). However, upon N-fold validation this drops to 46% overall accuracy (47% for LOW, 44% for MID+). This suggests that while the TRACs may be related to Treg numbers or function, the TRACs perform better at predicting CAD than known markers of Treg cells.

Example 4 Prior CAD prediction models.

Other published and patented works have identified transcripts with predictive value for CAD based on Affymetrix array technology and PaxGene blood RNA preservation tubes (U.S. Pat. Nos. 9,122,777 and 8,914,240, the contents of which are incorporated by reference). Prior studies using microarray profiling of PaxGene preserved blood have led to one blood RNA-based diagnostic (Corus CAD by CardioDx). While the CardioDx test (C=0.745) improves slightly on a purely clinical risk assessment (C=0.732), the present RNAseq test proved to be significantly more accurate (C=0.870). Further, the CorusCAD test by CardioDx uses the age and gender of the subject in calculating their risk score, while the present TRAC method is solely based on gene expression levels.

To compare the methods more directly, we identified 17 best matched transcripts from those patents and publications, without knowledge of the expression levels or predictive values in the current dataset. The expression levels of those 17 CardioDx markers did not differ between Low and MID+ patients in our RNAseq dataset (see Table 11). Nonetheless, we attempted to build a classification model using them, and these 17 transcripts in a PLSD model had 73% overall accuracy (75% for LOW, 71% for MID+), but the accuracy fell to 63.5% in N-fold validation, and 50% would be random chance. Thus, while it is difficult to make an accurate head to head comparison, because they are fundamentally different methods, the present TRAC biomarkers are better at predicting CAD than existing RNA biomarkers.

Example 5 A small set of transcripts with high predictive ability.

Based upon the medium-sized TRAC lists, specifically Table 6 (198 transcripts, 169 unique), a smaller panel of transcripts was selected based upon specific criteria for their suitability in human diagnostics. These criteria related to the putative association of the transcript with Treg cells, a high level of expression, and relative independence from close gene family members. Using these criteria, 7 transcripts were selected: DLG1, DGKA, ICOSLG, IKZF4/Eos, SMYD3, TCF3, and TRIM28. The levels of these transcripts were extracted from the RNAseq data, expressed as a percentage (ratio) of the mean expression of that transcript in the entire cohort, to minimize overweighting of highly expressed transcripts, and then the normalized levels of the 7 TRACs were added to make a cumulative score, as shown in FIG. 4 (right panel). The classification of the patients into CAD or low CAD was highly accurate (80.2%) based upon this TRAC score alone, with an ROC area (AUC, C-stat) of 0.873. By comparison, using clinical predictors alone, (FIG. 4 left panels), only 54% accuracy (with 50% as random chance) and a C-stat of 0.636 (with 0.500 as random chance) could be achieved.

Example 6 The effect of blood RNA stabilizers on detection of TRACs.

As discussed, other inventions have described gene panels for CAD that differ completely from the present panels. Those studies employed a preservative for blood RNA called PaxGene, which is based on a cationic detergent, while the present studies employed Tempus preservative, which is based upon guanididium-like denaturants. To determine whether the preservative had an effect on the detection of the TRACs, blood from the same donors were drawn into either PaxGene or Tempus tubes, RNA was prepared by the manufacturer's directions, and then equal amounts of RNA were quantitated by droplet digital PCR, after reverse transcription with iScript (BioRad). The results, shown in FIG. 5, clearly indicate that all of TRACs measured (7) were detected better in the Tempus preservative than in the Paxgene preservative by a range of 1.5 fold (TRIM28) to >7-fold (ICOSLG). Thus, the method of RNA preservation is important to the detection of these TRACs, and the use of differing preservatives may explain some differences between the results.

The accurate diagnosis of CAD via a blood-based test offers several major advantages in clinical medicine. First, it avoids invasive testing, such as cardiac catheterization, as well as allowing more judicious use of imaging resources, such at CT and MR angiography. Secondly, it can improve diagnosis of CAD in rural areas worldwide, where advanced imaging methods are unavailable. Finally, the individual biomarkers within the biomarker panel can serve both as therapeutic targets, and markers to monitor the efficacy of new or existing therapies, such as statins. For instance, Treg numbers have been shown to be responsive to statin therapy, and so it might possible to use TRACs to monitor therapies.

The connection between the immune system and atherosclerosis is well known. Blood components, especially monocytes/macrophages, neutrophils, lymphocytes, and platelets mechanistically contribute to the development of CAD. The present results are consistent with the extensive evidence that CAD is associated with changes in the Treg/Teff ratio, which are thought to be mechanistically related to atherosclerotic progression.

Mechanistically, the present results suggest that changes in the immune system may correlate with the presence of CAD, which has been a substantial line of investigation for many years, progressively building connections between lipid imbalances, inflammation, microbiome changes, and autoimmunity in atherosclerosis. There is a large and fairly consistent literature demonstrating changes in the Treg/Teff ratio in patients with CAD, and those observed cellular changes would be consistent in both direction and magnitude with the current changes in mRNA expression. One interpretation of the beneficial effects of statins is that in addition to lowering LDL cholesterol, statins can induce FoxP3+ Treg cells, via modulation of TGF-β signaling. Beyond the reproducible clinical correlation, experimental manipulation of Treg levels in mouse models of atherosclerosis lends credence to a potentially causal relationship. Furthermore, it has been suggested that an immune modulatory approach may offer therapeutic potential for atherosclerosis. It is plausible that a stimulatory effect of statins on Treg number could contribute to their anti-atherogenic effect, an effect that has not yet been reported for PCSK9 inhibitors.

An additional relationship can be seen between Treg dysfunction and atherosclerosis through the well-known incidence of atherosclerosis in various autoimmune diseases, but most notably in the case of systemic lupus erythematosus (SLE). While the relationship between Treg and SLE is complex, there is a general consensus that deficient Treg numbers and/or function is one element of SLE, and thus, might also be a component of SLE-associated atherosclerosis. Likewise, psoriasis and psoriatic arthritis, which are associated with Treg imbalances, have well-established associations with atherosclerosis. The immune-CAD connection is further strengthened by an apparently causal relationship in immune-mediated transplant arteriosclerosis.

Potentially the strongest evidence for the immune-CAD connection derives from the proven efficacy of rapamycin, and related compounds which are antibiotics/immunosuppressants, to block restenosis. Rapamycin is known to increase Treg numbers and function at clinically relevant levels.

Many of the TRACs have known relationships with Treg function. Foremost, the forkhead-box transcription factor FoxP3 is considered the definitive marker for the Treg subset, and is thought to transcriptionally regulate a set of transcripts involved in Treg function. FoxP3 is itself epigenetically controlled by promoter demethylation to allow stable expression in Treg, and in turn regulates Treg-specific transcription via known promoter binding sites. Other studies indicate that two isoforms of diacylglyceral kinase (DGKA, DGKZ) have been implicated in T cell anergy, and DGKZ has been implicated in the generation of natural Tregs via modulation of the NFkB signaling through c-rel. The FoxP3 transcription factor is known to interact with Ets transcription factors, and prior work on aortic aneurysm transcript profiling demonstrated that this atherosclerotic condition was associated with coordinated changes in Ets transcription factor-dependent signaling in the aortic tissue. Zap70, the zeta chain associated protein, is preferentially phosphorylated in Treg cells during TCR engagement.

While the underlying cause of the Treg imbalance is currently unknown, there is increasing evidence that changes in the microbiome could alter the type of short-chain fatty acids released into the circulation, thereby modulating both GI and circulating Treg differentiation.

In addition to composing a diagnostic test, it is likely that TRACs can help to elucidate some of the underlying components of CAD, and possibly new targets for therapeutics, in particular, by modulating the Treg/Teff ratio. Through high-throughput screening, dozens of FDA compounds that stimulate Treg generation can be rapidly assessed.

The transcript profile identified here can be adapted to creating a screening test for CAD in asymptomatic individuals, where early detection would allow risk modification and drug therapy. Using microarrays, we've used a similar blood RNA approach to generate new insights into the origin of other cardiovascular diseases, such as adriamycin cardiotoxicity and aspirin resistance (AR), and made the innovative observation that AR is potentially due to an autoimmune antiplatelet syndrome.

There are certain limitations to the present studies: 1) Confounding variables: the TRAC signature could be related to a risk factor or drug treatment that differs between groups. However, based on the clinical variables collected, we cannot identify a co-morbid condition or prescription drug use that would differ sufficiently to create this effect, but it is difficult to completely rule it out. 2) Specificity: It is possible that the TRACs would detect disease in any artery, but even if the test was detecting atherosclerosis in other arteries, it would still have tremendous diagnostic value. The combination of chest pain and a positive TRAC profile in blood would be strong evidence that catheterization, or other follow-up diagnostics, is warranted. Likewise, numbness in the legs, combined with a positive TRAC test, would be sufficient to justify peripheral vascular ultrasound. The Cardiogram group observed that their predictors for CAD also predicted aortic atherosclerosis, suggesting that blood markers are likely to be general predictors of atherosclerosis. Prior work has suggested that the location of atherosclerosis is effectively an endophenotype associated with specific risk factors in a given patient, which one can speculate may be related to the particular autoantigens that are prevalent in particular vascular beds. 

What is claimed is:
 1. A method for determining an increased likelihood of having or developing coronary artery disease, the method comprising: obtaining a blood sample from a subject; extracting RNA from the blood sample; quantitating RNA levels in at least a portion of the extracted RNA; calculating expression levels for a plurality of RNA transcripts in the extracted RNA; comparing the expression levels for the plurality of TRAC RNA transcripts to control expression levels for the same RNA transcripts; and determining whether the expression levels for the plurality of TRAC RNA transcripts in the extracted RNA are, collectively, significantly different from the control expression levels for the same RNA transcripts, wherein a significant difference in expression levels indicates that the subject has an increased likelihood of having or developing coronary artery disease.
 2. The method of claim 1, wherein the blood sample comprises whole blood.
 3. The method of claim 1, further comprising depleting ribosomal RNA from the extracted RNA prior to quantitating at least the portion of the extracted RNA.
 4. The method of claim 3, wherein depleting ribosomal RNA from the extracted RNA comprises treating the extracted RNA with probes that are specific for ribosomal RNA.
 5. The method of claim 1, wherein the plurality of RNA transcripts include FOXP3, CD4, CD25, ETS1, and RUNX1, and wherein expression levels for each of FOXP3, CD4, CD25, ETS1, and RUNX1 are down-regulated when there is a significant difference in expression levels.
 6. The method of claim 5, wherein the expression levels for each of FOXP3, CD4, CD25, ETS1, and RUNX1 are down-regulated between 1.5-fold and 2-fold when there is a significant difference in expression levels.
 7. The method of claim 1, wherein the plurality of RNA transcripts include DGKA, DLG1, ICOSLG, IKZF4/Eos, SMYD3, TCF3 and TRIM28, and wherein expression levels for each of DGK, DLG1, ICOSLG, IKZF4/Eos, SMYD3, TCF3 and TRIM28 are down regulated when there is a significant difference in expression levels.
 8. The method of claim 7, wherein the expression levels for each of DGKA, DLG1, ICOSLG, IKZF4/Eos, SMYD3, TCF3 and TRIM28 are down-regulated between 1.5-fold and 2-fold when there is a significant difference in expression levels.
 9. The method of claim 1, wherein the subject is human.
 10. The method of claim 1, further comprising selecting a subject that has been identified as having one or more risk factors for coronary artery disease prior to obtaining the blood sample from the subject.
 11. The method of claim 10, wherein the risk factors are selected from the group consisting of hypertension, advanced age, elevated cholesterol, and a family history of coronary artery disease.
 12. The method of claim 1, wherein the method identifies coronary artery disease with a positive predictive value of greater than 80%.
 13. The method of claim 12, wherein the method identifies coronary artery disease with a positive predictive value of greater than 90%.
 14. The method of claim 1, further comprising treating the subject with one or more of a statin, aspirin, an ACE inhibitor, a diuretic, a glucose control medication, or a beta blocker based on the determination that the subject has an increased likelihood of having or developing coronary artery disease.
 15. The method of claim 1, further comprising subjecting the subject to coronary angiography based on the determination that the subject has an increased likelihood of having or developing coronary artery disease.
 16. The method of claim 1, wherein the control expression levels are determined based on empirical expression level data from individuals who do not have coronary artery disease.
 17. The method of claim 16, wherein the individuals who do not have coronary artery disease have been identified as not having coronary artery disease by coronary catheterization angiography.
 18. A method for determining an increased likelihood of having or developing coronary artery disease, the method comprising: obtaining RNA that has been extracted from a blood sample of a subject; determining expression levels for a plurality of biomarkers in the extracted RNA; comparing the expression levels for the plurality of biomarkers in the extracted RNA to the expression levels of the same biomarkers from a control; and determining whether the expression levels for the plurality of biomarkers in the extracted RNA are, collectively, significantly different from the expression levels for the same biomarkers in the control; wherein at least 20% of the plurality of biomarkers are selected from the group consisting of, AML1, CD4, CD3E, CD25, CTLA4, DGKA, DLG1, ETS1, FOXP3, ICOSLG, IKZF4/Eos, IL2RA, IL2RB, IL2RG, SMYD3, RUNX1, TCF3, TRIM28, and ZAP70.
 19. The method of claim 18, further comprising performing coronary catheterization angiography if the expression levels for the plurality of biomarkers are significantly different from the expression levels for the same biomarkers in the control.
 20. The method of claim 18, wherein more than 80% of the biomarkers are, individually, differentially down-regulated more than 1.5-fold.
 21. The method of claim 18, wherein the method identifies coronary artery disease with a positive predictive value of greater than 90%.
 22. The method of claim 18, further comprising treating the subject with one or more of a statin, aspirin, an ACE inhibitor, a diuretic, or a beta blocker based on the determination that the subject has an increased likelihood of having or developing coronary artery disease.
 23. The method of claim 18, wherein the subject is human.
 24. A method of analyzing an RNA sample from a subject, the method comprising: determining expression levels for a plurality of biomarkers in the RNA sample, wherein the plurality of biomarkers comprises biomarkers for which expression levels in individuals with coronary artery disease differ from expression levels in individuals without coronary artery disease; wherein the level of expression of the plurality of biomarkers in comparison to a control indicates that the subject has an increased likelihood of having or developing coronary artery disease.
 25. The method of claim 24, wherein the RNA sample is an RNA sample that has been isolated from whole blood.
 26. The method of claim 24, wherein the control is a database comprising expression levels of the plurality of biomarkers in individuals without coronary artery disease.
 27. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts listed in Table
 8. 28. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 7. 29. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 6. 30. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 5. 31. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 4. 32. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 3. 33. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 2. 34. The method of claim 24, wherein the plurality of biomarkers comprises the transcripts of Table
 61. 35. The method of claim 24, wherein the plurality of biomarkers comprises at least 50%, 60%, 70%, 80%, 90%, and/or 95% of the transcripts of any of the biomarkers listed in Tables 3-10.
 36. The method of claim 24, wherein at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and/or 95% of the plurality of biomarkers are down-regulated at least 1.5 fold when the subject is determined to have the increase likelihood of having or developing coronary artery disease.
 37. The method of claim 24, wherein at least 50%, 60%, 70%, 80%, 90%, and/or 95% of the plurality of biomarkers are selected from any of the biomarkers listed in Tables 3-10.
 38. A method of treating a subject that has been identified as having one or more risk factors for coronary artery disease, the method comprising: analyzing an RNA sample using the method of claim 24; and administering a medicament to or carrying out a medical procedure on the subject on the basis of the determination that the subject has an increased likelihood of having or developing coronary artery disease.
 39. The method of claim 38, wherein the risk factors are selected from the group consisting of hypertension, advanced age, elevated cholesterol, and a family history of coronary artery disease.
 40. A method for determining an increased likelihood of having or developing coronary artery disease, the method comprising: obtaining a blood sample from a subject; extracting protein from the blood sample; quantitating protein levels in at least a portion of the extracted protein; calculating the concentrations of a plurality of proteins in the extracted protein; comparing the expression levels for the plurality of proteins to control levels for the same proteins; and determining whether the levels for the plurality of proteins in the extracted protein sample are, collectively, significantly different from the control levels for the same proteins, wherein a significant difference in protein levels indicates that the subject has an increased likelihood of having or developing coronary artery disease.
 41. The method of claim 40, wherein the plurality of proteins include AML1, CD4, CD3E, CD25, CTLA4, DGKA, DLG1, ETS1, FOXP3, ICOSLG, IKZF4/Eos, IL2RA, IL2RB, IL2RG, SMYD3, RUNX1, TCF3, TRIM28, and ZAP70.
 42. A method of analyzing a protein sample from a subject, the method comprising: determining expression levels for a plurality of biomarkers in the protein sample, wherein the plurality of biomarkers comprises biomarkers for which expression levels in individuals with coronary artery disease differ from expression levels in individuals without coronary artery disease; wherein the level of expression of the plurality of biomarkers in comparison to a control indicates that the subject has an increased likelihood of having or developing coronary artery disease.
 43. The method of claim 42, wherein the plurality of biomarkers comprises at least 50%, 60%, 70%, 80%, 90%, and/or 95% of the proteins encoded by any of the biomarkers listed in Tables 3-10.
 44. The method of claim 42, wherein at least 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90% and/or 95% of the plurality of biomarkers are down-regulated at least 1.5 fold when the subject is determined to have the increase likelihood of having or developing coronary artery disease.
 45. The method of claim 42, wherein at least 50%, 60%, 70%, 80%, 90%, and/or 95% of the plurality of biomarkers are selected from any of the biomarkers listed in Tables 3-10.
 46. A method of treating a subject that has been identified as having one or more risk factors for coronary artery disease, the method comprising: analyzing a protein sample using the method of claim 42; and administering a medicament to or carrying out a medical procedure on the subject on the basis of the determination that the subject has an increased likelihood of having or developing coronary artery disease.
 47. The method of claim 46, wherein the risk factors are selected from the group consisting of hypertension, advanced age, elevated cholesterol, and a family history of coronary artery disease. 